Provable and practical approximations for the degree distribution using sublinear graph samples

نویسندگان

  • Talya Eden
  • Shweta Jain
  • Ali Pinar
  • Dana Ron
  • Seshadhri Comandur
چکیده

Œe degree distribution is one of the most fundamental properties used in the analysis of massive graphs. Œere is a large literature on graph sampling, where the goal is to estimate properties (especially the degree distribution) of a large graph through a small, random sample. Œe degree distribution estimation poses a signi€cant challenge, due to its heavy-tailed nature and the large variance in degrees. We design a new algorithm, SADDLES, for this problem, using recent mathematical techniques from the €eld of sublinear algorithms. Œe SADDLES algorithm gives provably accurate outputs for all values of the degree distribution. For the analysis, we de€ne two fatness measures of the degree distribution, called the h-index and the z-index. We prove that SADDLES is sublinear in the graph size when these indices are large. A corollary of this result is a provably sublinear algorithm for any degree distribution bounded below by a power law. We deploy our new algorithm on a variety of real datasets and demonstrate its excellent empirical behavior. In all instances, we get extremely accurate approximations for all values in the degree distribution by observing at most 1% of the vertices. Œis is a major improvement over the state-of-the-art sampling algorithms, which typically sample more than 10% of the vertices to give comparable results. We also observe that the h and z-indices of real graphs are large, validating our theoretical analysis. ACM Reference format: Talya Eden, Shweta Jain, Ali Pinar, Dana Ron, and C. Seshadhri. 2016. Provable and practical approximations for the degree distribution using sublinear graph samples. In Proceedings of , , , 12 pages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sampling from social networks’s graph based on topological properties and bee colony algorithm

In recent years, the sampling problem in massive graphs of social networks has attracted much attention for fast analyzing a small and good sample instead of a huge network. Many algorithms have been proposed for sampling of social network’ graph. The purpose of these algorithms is to create a sample that is approximately similar to the original network’s graph in terms of properties such as de...

متن کامل

Sublinear-Time Algorithms for Monomer-Dimer Systems on Bounded Degree Graphs

For a graph G, let Z(G, λ) be the partition function of the monomer-dimer system defined by ∑ k mk(G)λ , where mk(G) is the number of matchings of size k in G. We consider graphs of bounded degree and develop a sublinear-time algorithm for estimating logZ(G, λ) at an arbitrary value λ > 0 within additive error ǫn with high probability. The query complexity of our algorithm does not depend on th...

متن کامل

Existence and Iterative Approximations of Solution for Generalized Yosida Approximation Operator

In this paper, we introduce and study a generalized Yosida approximation operator associated to H(·, ·)-co-accretive operator and discuss some of its properties. Using the concept of graph convergence and resolvent operator, we establish the convergence for generalized Yosida approximation operator. Also, we show an equivalence between graph convergence for H(·, ·)-co-accretive operator and gen...

متن کامل

Floating-Point LLL: Theoretical and Practical Aspects

The text-book LLL algorithm can be sped up considerably by replacing the underlying rational arithmetic used for the Gram-Schmidt orthogonalisation by floating-point approximations. We review how this modification has been and is currently implemented, both in theory and in practice. Using floating-point approximations seems to be natural for LLL even from the theoretical point of view: it is t...

متن کامل

Exact Shortest Path Queries for Planar Graphs Using Linear Space

We provide the first linear-space data structure with provable sublinear query time for exact point-topoint shortest path queries in planar graphs. We prove that for any planar graph G with non-negative arc lengths and for any > 0 there is a data structure that supports exact shortest path and distance queries in G with the following properties: the data structure can be created in time O(n lg(...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1710.08607  شماره 

صفحات  -

تاریخ انتشار 2017